Hopper Hierarchical Flow Improves Turnaround in Physical Design of Large IC

Editorial
Today's News
News Archives
On-line Articles
Current Issue
Magazine Archives
Subscribe to ISD

Directories:
Vendor Guide 2001
Advertiser Index
Event Calendar

Resources:
Resources and Seminars
Special Sections

Information:
2001 Media Kit
About isdmag.com
Writers Wanted!
Search isdmag.com
Contact Us

Hopper Hierarchical Flow Improves Turnaround in Physical Design of Large IC

By Paul Rodman
Integrated System Design
Posted 07/11/01, 11:49:33 AM EDT

Today 's problems in chip design are related to flow, not tools.Building an in-house flow -- the successful interplay of tools, data and people -- has become increasingly difficult because there aren't enough skilled people even as physical designs (such as SoC) keeping growing more complex. And if that isn't enough, there are deep-submicron semiconductor processes to consider, as well as the profusion of tools that have become mandatory. It's clear that engineers need more than just a bag of tools; they need a starting point built upon the collective flow and software expertise from the best-in-class design community.

Take the flow we used to design a high-performance graphics chip for 3dfx Interactive. It 's based on proprietary physical-design automation software called Hopper. Hopper makes practical a new automated physical-design flow that offers:

Abutted hierarchical design,
Concurrent design,
Automation of all tasks and easier “what if?” experimentation.

Hopper is built on top of commercially available tools such as Avanti 's Apollo, Hercules and StarRC-XT, as well as proprietary tools that perform such tasks as repeater insertion and clock distribution. Hopper is essentially an automation engine that enables ReShape to capture physical design flow know-how,with tool-specific knowledge (the best default settings) and design-specific knowledge (tool parameters and event ordering for a specific chip; see Fig.1,page 28).

Our challenge was to make Hopper meet the following characteristics of the 3dfx graphics chip:

fabbed using Taiwan Semiconductor Manufacturing Co.'s 0.18-micron process
six-layer metal
1.5 million placeable objects
30 million transistors
200 RAMs,four PLLs, three D/A converters, two AGPs
18 blocks (12 core, four pad-ring blocks)
18 different clocks,up to 533 MHz (typically 200 to 350 MHz)
largest block, 250,000 placeable objects
over 10,000 repeaters added.

Designs like these are growing so quickly that they are outpacing EDA tool capacity, which makes hierarchical physical design a necessity. The smaller netlists that result from a hierarchical approach translate to shorter run-times, higher tool reliability (fewer core dumps), improved quality of results and greater determinism from run to run.

But more important, in a hierarchical approach the design team gains the benefits of block-level parallelism,which makes it possible to employ people and tools more effectively. Block-level design enables different designers to work in parallel more effectively on the same chip.

A hierarchical flow also supports more deterministic results from run to run. By definition,dividing the chip into blocks limits the potential dispersion of cells and therefore reduces the potential for radical timing changes or congestion changes. However,in a traditional flat flow there is no guarantee that those cells will be located within the same appropriate proximity; therefore, every run with minor changes may cause new problems to surface.

One disadvantage of hierarchical design is that many optimizations don 't occur,because the blocks are separated and the changes that must be made are less apparent to the engineer. This is the “horizon effect” and it can result in poor-quality results. A number of tasks suffer from the horizon effect; they are:

pin assignment
circuit rules ((e.g.,max transition)
timing problems
verification problems ((e.g.,antenna rules)
clock distribution
power distribution.

ReShape's flow performs hierarchical physical design without these problems. Because our flow uses previous outputs as the input of the next run,along with the lat- est changes, we are able to see how the blocks fitted to- gether last time and leverage that history to refine our layout. The traditional flow tries to produce the optimal layout from one run;ours allows a series of runs that produce more optimized layouts each time. In a sense, the tools become smarter with each run and are able to use the previous layouts to avoid the horizon effect.

A traditional hierarchical flow relies on channels -- open lanes of empty space left between all the blocks -- to provide a pathway for connections for last-minute design fixes. Channels are undesirable for three reasons:

They create potential coupling problems because they implicitly bundle wires. This increases the risk the chip will not run at speed, and it may not run at all.
Top-level nets travel longer distances since they must go around, rather than through,the blocks to reach their destination. These longer distances can negatively affect timing.
They waste chip area.

ReShape 's flow solves the channel problem by removing channels and using abutted blocks (Fig.2,page 30). This optimizes block interconnections because signals cut through the blocks, utilizing the extra metal re- sources within those blocks. Consequently, there are no spaces between blocks.This allows for a more compact physical design and shorter wires, resulting in shorter paths, greater reliability and faster operation.

Hierarchical design enables concurrent physical and logic design. In a traditional flow, the back-end design team must wait until the front-end RTL design team has finished, resulting in both schedule and quality problems. For example,several logical blocks of a chip may already be complete while completion of others may be months away. In a typical flat methodology,the physical-design team would not have access to a complete netlist and therefore experimentation -- with meaningful results -- would not be possible.

Concurrent design

Without back-end feedback,the functional-design team can unknowingly create problems that are too difficult or costly to solve once the chip goes to physical design. But concurrent functional and physical design permits the physical-design team to start its process as soon as the main structure of some of the netlists is determined, which can be several months (or even a year) before the completion of the entire front-end design.

Starting physical design early enables the front-end team to refine the RTL to address problems generated by the physical-design process. Front-end designers make decisions that affect the physical design; therefore, the ideal methodology would provide them with information about the physical design on which to base those decisions. Early feedback about designs can produce a chip of much higher quality. In fact,with deep-submicron designs needing several repeater insertion delays just to cross the chip, early experimentation with the floor plan be- comes essential.

Concurrent design makes sense because the main structures of a block-level netlist generally emerge early in the design operation. The remainder of functional-design time is usually spent implementing the control logic, verifying the design and making minor bug fixes, but these changes do not usually have a major im- pact on the behavior of the netlist in the back end. With this in mind, why not give the physical-design team access to the parts of the logic that are complete and derive the benefits from concurrent design (Fig.3,page 32)?

In our case, concurrent front-end and back-end design enabled the RTL design team to make changes at a more convenient place in the design process and solve back-end problems more efficiently. We found this to be an important advantage.

For example,in the largest block of the design,we were plagued by congestion or hot spots that prevented a clean route. Inspection revealed that the portion of the netlist hierarchy in that area contained an enormous number of high-fanout nets acting as selects to AOI gates actually functioning as 2:1 multiplexers. Our Synopsys tools had chosen the AOI gate because it looked slightly better on paper.

Two fixes were implemented to solve this problem. We changed the synthesis script to use the infer-mux directive,which reduced the number of high-fanout nets by a factor of two. And we added a pass of buffer tree optimization to the flow for this block

By discovering these kinds of back-end obstacles early, we caught the RTL design team while it could still make changes relatively painlessly.

In deep-submicron designs, the difference between wire model and reality is huge,making them difficult or impossible to use for timing convergence. Some design teams will be conservative and build in margin to cover the difference. Unfortunately, with today 's processes this approach often isn 't feasible.

The ReShape flow creates per-block wire-load models,which are used for synthesis only. The logic engineer ignores the synthesis timing reports (except for simple A vs. B netlist comparisons), instead converging on the post-placement timing we provide from the flow.(This placement-based timing has been correlated to at least one previous full-route/full-extraction run.)

The wire model is used only to create a netlist with the appropriate level of stiffness for optimal back-end timing convergence.We found that once it is set properly engineers could inject new netlists into the flow and evaluate RTL or synthesis changes with real data in a few hours.

After a run, engineers want to know if the chip s con- verging on timing and if it has any routing-congestion problems. With the ReShape flow, we did the whole loop in just a few hours for most blocks. For example, a block with 100,000 placeable instances took about 10 hours to completely converge on timing and routing. This same process could take days in a traditional methodology. At some point the RTL converges on a final set of netlists and the push to tapeout is on.

Due to the automation and the hierarchical design process, we were able to build the entire 3dfx chip from scratch in 24 hours. Starting from gate-level netlists (netlists alone were over 1 Gbit) and with the previ- ous floor plan and flow configu- ration checked out from the source tree, we spawned more than 4,000 individual jobs, with all blocks placed and routed and timing con- verged. The ReShape tools and the Avanti runs created more than 10,000 files.

If network or hardware problems caused a crash, the flow could be automatically restarted and would resume execution where it left off.

Automatic steps

One of the most important advantages of the ReShape flow is the automation of thousands of manual steps

normally required for a hierarchical physical design. The flow provides a framework for adding special-value automation in incremental stages to solve the many problems that come up while building a block. We have identified time-consuming tasks that could be automated and we have developed code that 's incorporated into the flow to handle those tasks. Automation not only saves a significant amount of design time, but it is also based on previous chip successes and on proven configurations,enabling a design team to fine-tune these settings over time to ensure the highest tool performance.

Inherited knowledge

For example,we divide the placement process into several discrete steps. The first is preparing the command file. Our flow opens our database and studies the block, using controls that have been defined according to user preferences, then automates the production of a command file that encapsulates all our learning about the best way to perform the placement tasks for that block. Any person who uses the flow inherits the benefits of any knowledge from the flow builders -- and perhaps that block 's previous builder.

Another example is the automation of log file review. The log file is a critical line of communication from vendor tools to inform the user about the results of each task execution. If you do not go through every line in the log file, you may miss a single-line message, among tens of thousands of lines,that indicates a problem. The sad implications of this message may not be obvi- ous for days or weeks. The ReShape flow has embedded log-checking software that automatically opens and reads the log, looking for indications of errors and highlighting them, and stopping the process.

No matter how good design tools are, new physical-design challenges usually emerge on new projects or with a new library or process. By nature, EDA vendors can address the needs of only some of them.But the ReShape flow creates special-purpose code to deal with special needs, then config- ures and “clicks it in” the flow. The flow acts as powerful framework for adding such tools.

For example,in the 3dfx chip, there were several AGP and SDRAM buses with very tight clock-skew specs. On previous chips built by 3dfx, skew was handled with manual editing. However,if the pad ring needed to be changed -- e.g., the core size changed or the pads moved around -- the hand layout could not be used. So the manual layout would need to be redone completely to fit the new design.

To solve this problem,ReShape wrote a “point tool” to handle the AGP bus layout and the code was replayable when changes were made to the floor plan. So we could change the physical design of the chip, make it larger or smaller with the push of a button, and all the previously configured data would replay and build the balance buses that we needed for the AGP spec each time. This gave us the flexibility to experiment by changing the chip size without having to be concerned about the intricate layout of the balanced buses.

In fact,the entire construction of the pad ring is often one of the most manual processes in building a chip. But we have developed a library of configurable point tools that we can use to create a replayable flow for all the steps in assembling the pad ring.

The ultimate test in a flow is how quickly are you able to implement last-minute changes. One of the most important advantages of the hierarchical design flow is the ability to respin a block from a new netlist without affecting the rest of the chip. In a flat design methodology, any change to any part of the chip could potentially affect the entire chip, requiring a tremendous redesign effort that could incapacitate the chip.

The hierarchical approach provides a more deterministic path to the final physical design,allowing us to deal with last-minute netlist changes. For example, we changed 30,000 gates of the design just three weeks before tapeout. This was a critical fix, adding an important new feature to comply with the latest graphics standards, and was essential in terms of marketing the chip. Yet as important as this fix was, it only affected three blocks in the design out of a total of 22 blocks, so the engineers isolated the affected blocks and resynthesized only those three blocks without having to touch the rest of the chip -- all the other blocks were still considered on the shelf. A flat flow would have required rebuilding the whole chip -- and a potentially large delay.

Shrinks and variants

Even as the ReShape flow has significant short-term advantages for a chip being designed today, it has advantages for future chips. Because the flow collects and incorporates knowledge about the process and the chip during the design process, it leverages that knowledge. In this way, future chips on the same process or similar chips on a different process can stand on the shoulders of the previous work.

The proof of the value in using a flow came only a month after tapeout, when 3dfx used the flow to tape out another chip in the same process. There were ap- proximately 700,000 placeable objects, yet three more chips were in design with a new 0.15-micron flow at the time of 3dfx 's demise.

Respins of a chip in a new process are aided by the fact that all the floor-planning information is represented so that it relates to previous work. In addition, all dimensions are specified as much as possible with process-scalable parameters such as units of wire pitches. This makes it easier to get a resynthesis of the chip up and running in a new flow with the same floor plan (only smaller,of course). We have switched a chip from 0.25 to 0.18 micron in just two days.

New chips often have blocks that are recycled from previous designs. In our flow, the floor-planning code for these blocks is also highly recyclable.

Managing data

A traditional hierarchical design flow also presents a data-management challenge. The number of scripts, command files and databases required increases by N-fold with an N-block hierarchical flow. Even though each block is very manageable in size and complexity, the number of jobs creates significantly more work for the design team. If any changes to the floor plan are re quired,such as block size changes or movement, all of the block-level scripts and command files require regeneration in the traditional hierarchical flow.

The ReShape flow centralizes,organizes and automates the generation of all these block-specific objects. Thousands of tool settings must be set to intelligent defaults; however, we have to be able to change any of them at any point in the flow. To solve this problem, we use a hierarchy of configuration files to control settings based on the process used, the particular chip being built and the particular block within the chip. Instead of hundreds of scripts with knob settings dispersed throughout, we have a small number of centralized files that can be easily kept under revision control.

We think of this bundle of context-dependent,variable settings as a “tech object.” When we need to bring up a new process, we can debug the settings for this tech object and then export it and do a chip respin in a new process by reinstantiating the flow with it.

The flow also allows users to share data while the block is being developed. For example,typically one person is in charge of the floor plan for the chip and the top-level power and clock distribution. This person then exports the per-block context to all the block owners on the chip.They are each able to build a complete copy of the chip from this imported context, although they typically work only on the blocks they are responsible for. In the end, the chip could be taped out from any of the block views since they all have the same pin abutments, power hookups and other global objects.

In our case,the results speak for themselves. When the 3dfx Interactive chip came back from Taiwan Semiconductor it was placed in the board and it worked at full speed -- a testament to the design team's rigorous functional and timing-verification methodology, Hopper and the ReShape physical-design flow (Fig.4).

Print this story Send as e-mail Back Home

Sponsor Links